WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia
نویسندگان
چکیده
We present WIKIREADING, a large-scale natural language understanding task and publicly-available dataset with 18 million instances. The task is to predict textual values from the structured knowledge base Wikidata by reading the text of the corresponding Wikipedia articles. The task contains a rich variety of challenging classification and extraction sub-tasks, making it well-suited for end-to-end models such as deep neural networks (DNNs). We compare various state-of-the-art DNNbased architectures for document classification, information extraction, and question answering. We find that models supporting a rich answer space, such as word or character sequences, perform best. Our best-performing model, a word-level sequence to sequence model with a mechanism to copy out-of-vocabulary words, obtains an accuracy of 71.8%.
منابع مشابه
Hierarchical Question Answering for Long Documents
Reading an article and answering questions about its content is a fundamental task for natural language understanding. While most successful neural approaches to this problem rely on recurrent neural networks (RNNs), training RNNs over long documents can be prohibitively slow. We present a novel framework for question answering that can efficiently scale to longer documents while maintaining or...
متن کاملWebNav: A New Large-Scale Task for Natural Language based Sequential Decision Making
We propose a goal-driven web navigation as a benchmark task for evaluating an agent with abilities to understand natural language and plan on partially observed environments. In this challenging task, an agent navigates through a web site, which is represented as a graph consisting of web pages as nodes and hyperlinks as directed edges, to find a web page in which a query appears. The agent is ...
متن کاملDBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus
The ever increasing importance of machine learning in Natural Language Processing is accompanied by an equally increasing need in large-scale training and evaluation corpora. Due to its size, its openness and relative quality, the Wikipedia has already been a source of such data, but on a limited scale. This paper introduces the DBpedia Abstract Corpus, a large-scale, open corpus of annotated W...
متن کاملThe Links Have It: Infobox Generation by Summarization over Linked Entities
Online encyclopedia such as Wikipedia has become one of the best sources of knowledge. Much effort has been devoted to expanding and enriching the structured data by automatic information extraction from unstructured text in Wikipedia. Although remarkable progresses have been made, their effectiveness and efficiency is still limited as they try to tackle an extremely difficult natural language ...
متن کاملIranian EFL Learners’ Motivational Fluctuation in Task Performance over Different Timescales
Motivation for learning a new language is both self and time-oriented. The language learner’s motivation experiences gradual fluctuation over time and the view of oneself is different on each timescale of the study. Interaction among different timescales throughout the Second Language Development (SLD) is a novel area of investigation (de Bot, 2015). In order to probe this interactive nature, t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1608.03542 شماره
صفحات -
تاریخ انتشار 2016